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Abstract 

This paper presents a study made to automatically 
recognize lesion-like regions in black skin medical 
images. Skin lesions have consistently had one of the 
most rapidly increasing incidences of all cancers. Early 
diagnosis is particularly important but it is a challenging 
task, especially for black populations. Moreover, black 
skin specialists are often limited to the use of classic 
macroscopic images for diagnosis purpose. All these 
difficulties are related to black skin pigmentation level 
and the small visual differences between lesion parts and 
healthy ones. 

We propose here a computerized method which identifies 
automatically lesion regions with more accuracy and 
efficiency. It works like an automaton that traverse lesion 
images and automatically classifies healthy regions from 
lesions regions. The designed classifier is a Multi-Layer 
Perceptron Artificial Neural Network (MLP-ANN) 
trained with color and texture features. We made many 
combinations of features and varied the number of 
neurons in hidden layer, in order to obtain best 
performances. Nine features (six of texture and three of 
color) have been retained to train the network. The 
achieved classification performance is 97.2% in both 
training, validation and testing set. 1 

Keywords: Medical images, Black skin, Neural Network, 
Lesion recognition, Texture features. 

Nomenclature 

ANN 
CAD 
MLP 

MLP-ANN 

CADS 
PPV 
NPV 
TPR 
SPC 


Artificial Neural Network 
Computer Aided Diagnosis 
Multi-Layer Perceptron 
Multi-Layer Perceptron type Artificial 
Neural Network 

Computer Aided Diagnosis System 
Positive Predictive Value or Precision 
Negative Predictive Value 
True Positive rate or Sensitivity 
Specificity or True Negative Rate 


1 This study has been implemented on Matlab Neural 
Network Toolbox software at LETIA lab. University of 
Abomey-Calavi in Republic of Benin 


LPR 

false Positive Rate 

ACC 

Accuracy 

LP 

false Positive 

LN 

false negative 

TP 

True Positive 

TN 

True Negative 


1. Introduction 

Dermatology is one of the fields where medical images 
have been used for years, since their potential benefits 
had been revealed in 1992 by Stoecker and Moss [1]. 
However, many parameters such as: type and quality of 
images, noise, or acquisition techniques influence image 
processing. About that, black skin images are certainly 
one for which difficulties occur most. In fact, while 
several studies proved the effectiveness of dermoscopic 
images based diagnosis [2-5], dermoscopy is ineffective 
in black skin and then unusable by black skin specialists. 
Therefore, these specialists are limited to the use of 
standard digital cameras as acquisition devices. Once 
images are acquired, major difficulties appear regarding 
their processing. 

Indeed, in black skin, there is the presence of pigment 
throughout the basal layer of epidermis [6]. Therefore, 
the approach with black population is quite different, 
even if diagnostic procedure is generally the same, 
regardless of skin color [7]. So, it is important to make 
research in the direction of setting up Computer Aided 
Diagnosis (CAD) systems specific to black skin. In this 
context, we proposed in a previous paper, a method to 
analyze black skin macroscopic medical images. In that 
work, we focused on preprocessing and segmentation 
using a combination of mathematical morphology and 
edge detection [8]. The method improved edge detection 
but had also shown its limits. So, further studies deserved 
to be undertaken in order to develop a more efficient 
system. 

With the significant advances occurred during recent 
years in the field of computer science and artificial 
intelligence, medical images processing has become 
more reliable and accurate [9]. Some good applications 
can be found in [10], [11] [12] and [13]. Artificial Neural 
Networks (ANNs) in particular, have been use to solve 
many image processing problems, as reported in [14], [15] 
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and [16]. Among them, we can enumerate: applications 
of ANNs in pre-processing step [17-19]; ANNs trained to 
perform segmentation task [20-22]. ANNs have also been 
used in medical diagnosis as main stage of a CADS [23]. 
Despite these advances, there exist very few studies on 
the subject of black skin, as regards the development of 
CADS. The typical pipeline adopted for automated skin 
lesion diagnosis is: image acquisition, artifact detection, 
lesion segmentation, feature extraction and classification 
[24]. In this paper, we propose a neural network based 
classifier acting in the artifact detection stage. It has been 
trained with color and texture features and works like an 
automaton that traverse black skin lesion images and 
automatically differentiates healthy regions from lesions 
regions. 

The remainder of the paper is organized as follows: 
Section (2) discusses the theory about feedforward type 
neural networks; Section (3) emphasizes on the material 
and the methods used in this work; In Section (4) we 
present the results and make some discussions and finally, 
we conclude in Section (5). 

2. Theory 

An Artificial Neural Network is a computational system 
inspired by the structure, the processing method and the 
learning ability of a biological brain. It is characterized 
by a large number of very simple neuron-like processing 
elements, a large number of weighted connections (links) 
between these elements and a distributed representation 
of knowledge over the connections [25]. Knowledge is 
acquired by the network through a learning process [26]. 
ANNs’ expansion started in 1943 when McCulloch and 
Pitts proved that neuron can have two states and that 
those states could be dependent on some threshold value 
[27]. Since then, ANNs have been widely used in 
research because they can model highly non-linear 
systems in which the relationship among the variables is 
unknown or very complex [28]. A review of various 
classes of neural networks can be found in [29] and [30]. 
Multilayer perceptrons (MLPs) are the most popular type 
of neural networks in use today. They belong to a general 
class of structures called feedforward neural networks, a 
basic type of neural network capable of approximating 
generic classes of functions, including continuous and 
integrable functions [31]. 

MLP Structure: In MLP structure, neurons are grouped 
into layers. The first and last layers are called input and 
output layers respectively, because they represent inputs 
and outputs of the overall network. The remaining layers 
are called hidden layers. Typically, an MLP neural 
network consists of an input layer, one or more hidden 
layers, and an output layer, as shown in Figure 1. 
Mathematical background: Suppose the total number of 
layers is L. The 1 st layer is the input layer, the I th layer is 
the output layer, and layers 2 to L-l are hidden layers. 
Let the number of neurons in £ th layer 
be JV^ = 1,2,...,L and represent the weight of the 

link between j th neuron of i -X th layer and i th neuron of 
I th layer, l<j<N l _ l , and l<i<N £ . Let x, represent the i th 
external input to the MLP, and Z\ be the output of i m 


neuron of i th layer. An extra weight parameter wf 0 is 
introduced for each neuron, representing the bias for i th 
neuron of £ th layer. As such, w of MLP includes W-j , 

j=0,l,..., N t _ l9 i=l,2,..., N r £ =2,3,...,L. w expression is 
given by equation (1): 

w = [ w m w u w i2 - ™n l n l _> 1 (!) 


Layer L 
(output layer) 


Layer L - 1 
(hidden layer) 




Figure 1. Multilayer perceptron (MLP) Neural Network structure 

In a neural network, each neuron with the exception of 
neurons at the input layer receives and processes stimuli 
(inputs) from other neurons. The processed information is 
available at the output end of the neuron. A neuron of 

the/ h layer receives stimuli from neurons of £-l th layer, 
that is: h \ x , h j ' 2 ,..., h^ 1 . Each input is multiplied by the 
corresponding weight parameter, and resulting products 
are added to produce a weighted sum /. (cf. equation 3). 

This weighted sum is passed through a neuron activation 
function <J to produce the final output h. of the neuron 

(cf. equation 2). This output represents the hypothesis (or 
assumption) of the neuron, and it can become the 
stimulus for neurons in the next layer: 

hf=oir!) ( 2 ); = ( 3 ) 

]=o 

Learning process: The purpose of ANN is to learn to 
recognize patterns in data. Once the neural network has 
been trained on samples data, it can make predictions by 
detecting similar patterns in future data. The process by 
which MLP neural network achieves learning or training 
is a two steps: feedforward and backpropagation. 

In the feedforward process, the external inputs are first 
fed to the input neurons 1 st layer, the outputs from the 
input neurons are fed to the hidden neurons of the 2 nd 
layer, and so on, and finally the outputs of L- I th layer are 


2 



Graphics, Vision and Image Processing Journal, ISSN 1687-398X, Volume 16, Issue 3, ICGST LLC, Delaware, USA, Dec. 2016 


fed to the output neurons L th layer. The difference 
between the resulting outputs and the expected ones is 
calculated to get the error. In backpropagation, errors that 
result from previous step are propagated back through the 
system, causing the system to adjust the connection 
weights. The training begins with random weights and 
the goal is to adjust them until the error is minimal. This 
is achieved by minimizing the objective function E of 
equation (4): 

i m 

E = -£(h, L - yi ) 2 (4) 

^ 2=1 

More details about ANN learning process is available in 
[26]. 

3. Material and methods 

Material: We implemented our programs and algorithms 
with MATLAB, Version 8.4 (R2014b). We designed, 
implemented and trained the network with the MATLAB 
neural network toolbox. Our data set includes 40 medical 
images which were taken in same conditions (cf. Table I). 


Table I. Images Database Composition 


Types of Images 

Number 

Erythema lesion 

17 

Tuberous Sclerosis Complex (TSC) lesion 

6 

Purpura lesion 

5 

Seborrheic Keratosis lesion 

7 

Eczema lesion 

5 

Images showing black people skin 

20 


These images were taken with a HD digital camera in the 
Dermatology service of the Hubert K. Maga University 
Hospital of Cotonou. Figure 2 shows samples from each 
type of skin lesion. 



Figure 2. Some black skin lesions of our image database ( a: Purpura 
lesion; c - j : Eczema lesion; e: TSC lesion; g: Seborrheic Keratosis; 
b - d - f - h - i : different types of Erythema) 


We also downloaded images showing black people skin, 
from different online databases. From all these images, 
we extracted blocks of pixels of size 3x3, thanks to a 
program. Each block represents an example of the dataset 
(see figure 3 for extraction procedure). At the end, we 
obtained about 800.000 learning examples that we 


divided into learning, validation and test sets, in the 
respective proportions of 60%, 20% and 20%. 





Figure 3. Regions extraction to blocks of pixels constitution 

Methods: Normal skin regions are homogenous 
connected groups of pixels. So, color and texture 
information can be used to describe them. Fifteen 
features were identified to train the NN, but nine were 
finally retained. So the designed network has nine 
neurons in input layer and one in output layer. We varied 
the number of neurons in the hidden layer until we get 
the best performances. The input layer neurons 
correspond to features calculated for each example (3x3 
pixel block). There are three color descriptors (one for 
each RGB color channel) and six texture descriptors. 

Color features: We calculate the mean intensity value for 
each of the R, G and B channels (Mr, Mg and Mb) with 
formula (5). 

• for each block X (3,3), 

M * = 7T < 5 )’ k = {R,G,B} 

y 2=i 

Texture features: The retained texture features are: 
uniformity, standard deviation, skewness, kurtosis, 
smoothness, and entropy. They were chosen for their 

simplicity and efficiency. Let Z i represent the value of a 
pixel x(z) ; features are defined as follows: 

• Uniformity: it is maximum when all gray levels 
are equal (maximally uniform). 

u = Z p 2 ( z i) ( 6 ) 

2 = 1 

• Standard deviation: measure of average contrast. 

a = '£ j (z i -m) 2 p( z t ) ( 7 ) 

2=1 

• Skewness: it is the measure of the asymmetry of 
the intensity values around the mean intensity. 

s = Z( z . -m )T(z,) ( § ) 

2=1 

• Smoothness: its value is 0 for regions of constant 
intensity and approaches 1 for regions with large 
excursions in the values of its intensity levels. 
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r = 1- 


( 9 ) 

Kurtosis: it is the measure of how outlier-prone a 
distribution is. 


s = £(z,-m) 4 p(z,) (10) 

1=1 

• Entropy: a measure of randomness. 

e = “X P( Z i) l0 §2 P( Z i) (HI 

i=l 

With: 

1 the number of possible intensity levels; 

Zi a random value of pixel intensity; 
p(zi) the histogram of intensity levels in a 
region around zi; 

and m the average intensity defined by: 

m = Z z iP( z i) ( 12 ) 

i=i 


Network parameters: Activation function is the sigmoid 
function given by the formula (13). 


c<r)= 


l 

l+e" r 


(13) 


Sigmoid function is a smooth switch function as shown 
by the curve of Figure 4. It has the property described by 
expression (14). 


c(y) 


asy^+cc 
as y^ -go 


(14) 



count = 0 

WHILE count < NbOfB locks 
count = count +1 

Features = CalcFeatures(Blocks(count)) 
F=C omputeNN (F etaures) 

If (0.5<F<=1) THEN 

Blocks(count)=l 

ELSE 

Blocks(count)=0 

ENDIF 
ENDWHILE 
WRITE Outputlmage 
END 


READ EXTRACTION OF 

START Y *? COLOR *■ BLOCKS OF PIXELS : 

1 M AGE FI LE NbOfBlocks detenu ined 



OD 


H — 


WRITE 

BINARY 

IMAGE 


REBUILT IMAGE FROM 
BINARY BLOCKS 


Figure 5. Flowchart of the proposed recognition technique 


Figure 4. Sigmoid function. 

Functioning of the proposed Technique: the designed 
classifier works like an automaton that traverses black 
skin lesion images and is able to differentiate healthy 
regions from lesions regions. So, when an image is input, 
the first step consists in dividing it into blocks of pixels 
of size 3x3. Then, features are calculated for each block 
and the equivalent function of the classifier is used to 
calculate the output. When the decision of the classifier is 
close to 1, all pixels values are replaced by 1. If, it’s the 
contrary, values are replaced by 0. Thus, at the end we 
obtain a binary rebuilt image (cf. flowchart of figure 5). 
The corresponding pseudocode of all this process is: 
VARIABLES Inputlmage, F, Features[9], NbOfBlocks, 
Blocks [] , Outputlmage. 

PREDEFINED FUNCTIONS CalcFeatures, ComputeNN, 
ExtractB locks, 

BEGIN 

READ Inputlmage 

DO [NbOfBlocks, Blocks] = ExtractBlocks(Inputlmage) 


Evaluation metrics: We used six metrics to evaluate the 
performance of our classifier: Sensitivity, Specificity, 
Accuracy, False Positive Rate, Precision or Positive 
Predictive Value (PPV), Negative Predictive Value (NPV) 
and F 1-Score. To define these metrics, let: TP (true 
positive) stand for the number of regions correctly 
classified as lesion region, TN (true negative) be the 
number of regions correctly classified as healthy regions, 
FP (false positive) the number of regions wrongfully 
classified as lesion and FN (false negative) be the number 
of regions classified as healthy but which are in reality 
lesion regions. 

• Sensitivity or True Positive rate (TPR): 

TPR = — — — xlOO (15) 

TP + FN 

• False Positive Rate (FPR): 

FPR = — — — xlOO (16) 

FP + TN 


Specificity (SPC) or True Negative Rate: 
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TN 

TPR = xlOO (17) 

FP + TN 

• Accuracy (ACC): 

tp+tn 

ACC = x 1 00 (1 8) 

TP + FN + FP + TN 

• Precision or Positive Predictive Value (PPV): 

PPV = — ^ — xlOO (19) 

FP + TP 



Validation ROC 



• Negative Predictive Value (NPV): 

TN 

NPV = x 1 00 (20) 

FN + TN 

4 . Results and discussion 

Results: We got best performances with six neurons in 
hidden layer. With this configuration, the network 
achieved his best validation performance after 457 
iterations and the cross-entropy value of the objective 
function is around 0.083 (see Figure 6). 




False Positive Rate 


False Positive Rate 


Best Validation Performance is 0.083343 at epoch 457 



Figure 6. Training performance. 


Training Confusion Matrix Validation Confusion Matrix 



Target Class 


Target Class 


Test Confusion Matrix All Confusion Matrix 



Target Class 


Target Class 


Figure 7. Confusion matrix. 


Figure 8. ROC Curves of the classifier. 

The classification performance is good, considering error 
rates shown by confusion matrix of figure 7. Indeed, 
confusion matrix help to know percentages of samples 
which are misclassified in each set. According to the 
confusion matrix, the False Match Rate (FMR) in both 
training, validation and test stages is 2.8%. Moreover, the 
performance of the obtained classifier is more illustrated 
by the ROC curves of Figure 8. The form of ROC curve 
shows us that we have an excellent classifier if we refer 
to [32]. Indeed, the greater the area under the curve 
(AUROCC) is, the better the classifier is [33]. Finally, 
the values of evaluation metrics are reported in Table II. 


Table II. Values of Evaluation Metrics 


Metrics 

Values 

True Positive Rate (TPR) or Sensitivity 

96,9% 

True Negative Rate or Specificity (SPC) 

97,5% 

False Positive Rate (FPR) 

2,5% 

Precision or Predictive Positive Value (PPV) 

97,7% 

Negative Predictive Value (NPV) 

96,6% 

Accuracy (ACC) 

97,2% 


Discussion: In our modelling, Is stand for normal or 
healthy skin region and 0s for lesion regions. Thus, we 
can notice that 3.1% of normal skin regions are 
misclassified; whereas only 2.5% of lesion regions are 
misclassified. These values are satisfactory since low 
error rate in classification of lesion regions is particularly 
important for future segmentation purpose. 

The sensitivity of the classifier is: 96,9 %; its Specificity 
is: 97,5% and the False Positive Rate (FPR) is: 2,5%. 
These values are significant, though poor to validate 
machine learning experiments [32]. So, to strengthen our 
results, the following values are considered: Precision or 
PPV: 97,7%; NPV: 96,6%; and Accuracy: 97,2%. Their 
high values (more than 95%) indicate a certain reliability. 
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However, our results need to be compared with results 
obtained by similar work, although there is no specific 
previous work with black skin in the literature, apart from 
ours published in [8]. For this purpose, two works have 
caught our attention: those of (Madan et al., 2011) and 
(Jafari et al., 2016). The first worked on the detection of 
acne-like regions in macroscopic medical images of face 
[13]. The second recently used deep learning to extract 
skin lesions from non-dermoscopic (macroscopic) images 
[22]. A comparison with our results is made in Table III 
on obtained values for three metrics used by both authors: 
best results are bold. 


Table III. Comparison of Detection performances in 
Macroscopic Skin Images 



Papers or Works 

Metrics 

Madan et 

Jafari et 

Our Work 

al., 2011 

al., 2016 

Sensitivity 

90,4 % 

95,2 % 

96,9 % 

Specificity 

88% 

98,8 % 

97,5 % 

Accuracy 

89,2 % 

98,5 % 

97,2 % 


Best results of specificity and accuracy are obtained by 
Jafari et al., whereas best sensitivity value is obtained by 
us. This shows that our recognition performances need to 
be improved, even if the gap is not very big. However, it 
is important to note that both works compared to ours, 
were performed on white skin. This could justify the best 
performances. Indeed, contrasts between healthy regions 
and lesions are much sharper in this kind of skin. 

5. Conclusion 

The major contribution of this article arise from the use 
of ANN for black skin lesions. Indeed, using MLP-ANN 
for recognition tasks is not new but finding the good 
learning features to achieve good performances is the 
main contribution of this research work. The trained 
classifier identifies lesion-like regions in black skin 
medical images. To do this, we first set up a dataset of 
size around 800.000 examples. Texture and color features 
have been calculated for each of these examples. After 
that, we trained the network, while varying the number of 
neurons in hidden layer in order to get best performances. 
Finally we obtained a very good classifier which is able 
to differentiate lesion regions from safe skin regions in 
medical images with a small error rate. 

This study also allowed us to highlight the differences 
between black skin and white skin. These differences 
exist firstly in terms of color, secondly in anatomical and 
structural terms. They induce difficulties to dermatology 
practitioners, as seen in publications made on the subject. 
Texture and color information used here as learning 
parameters, confirms practices of black skin specialists, 
who usually resort to skin palpation (searching texture 
changes) and observation of color changes to make their 
diagnostics. 

But, although the performances of the obtained classifier 
are good, error rate is not less considerable. This is why 
the developed technique is used in "artifact detection" 
stage. Thus, further processing have to be made on 
resulting images before segmentation. 
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Abstract 

This paper presents a robust hybrid image watermarking 
technique using Discrete Wavelet Transform (DWT) and 
Singular Value Decomposition (SVD). The DWT is 
utilized to divide the image into non-overlapping bands. 
The reflectance component of the low-frequency sub-band 
(LL) is extracted using homomorphic transform for each 
color (red, green, and blue). The watermark embedding is 
performed using SVD on the reflectance component of the 
LL sub-band. The results of the proposed watermarking 
technique are compared with some state-of-the-art 
techniques. The comparison test is performed on images 
with different resolutions. It is based on visualization to 
detect any degradations in the watermarked image, Peak 
Signal-to-Noise Ratio (PSNR) of watermarked image, and 
Normalized Correlation (NC) of extracted watermark. 

Keywords: Watermarking, Discrete Wavelet Transform 
(DWT), Discrete Cosine Transform (DCT), Singular 
Value Decomposition (SVD), Homomorphic transform. 

Nomenclature 


DWT 

Discrete Wavelet Transform 

DCT 

Discrete Cosine Transform 

SVD 

Singular Value Decomposition 

MSE 

Mean Squared Error 

PSNR 

Peak Signal to Noise Ratio 

NC 

Normalized Correlation 


1. Introduction 

In recent years, information technology, digital data, and 
multimedia have been easily duplicated, manipulated, and 
distributed. So, it is very important to have copyright 
protection to save owners copyrights. There are several 
protection techniques, one of them is watermarking. 
Watermarking technology is the process of hiding an 
image called watermark or label into original digital data 
(image, video or audio) [1]. Watermarked images must be 
robust enough to survive attacks [1]. Watermarking 
techniques can be classified into two categories; spatial 
domain and transform domain [1]. Several techniques of 


transform domain watermarking are discussed in this 
paper. 

One of these techniques is the DWT [3,5,9]. It is based on 
dividing the image into four non-overlapping bands; 
approximation sub-band (LL), horizontal sub-band (LH), 
vertical sub-band (HL), and diagonal sub-band (HH) 
[3, 5, 6, 9]. 

Another widely-used technique of transform domain 
watermarking is the Discrete Cosine Transform (DCT) 
[11-16]. This transform is used to convert spatial domain 
image into discrete transform domain [14]. Another 
widely-used technique of transform domain watermarking 
is SVD [10,11]. It is an effective numerical analysis tool 
used to analyze matrices. The SVD transformation divides 
the matrix into three matrices with the same size of the 
original matrix; two orthogonal matrices and one diagonal 
matrix [9]. Then, the watermark is embedded in the 
diagonal matrix [6,8-10,16-18]. 

Another domain used for watermarking is the 
homomorphic domain [9]. It is based on image intensity 
represented as light illumination and reflectance of the 
image. Illumination is approximately constant and 
reflectance is variable from image to image. Thus, the 
reflectance of the image is used to perform the 
watermarking process. It must be noted that the SVD 
technique is utilized for avoiding the redundancy resulting 
from the existence of two components of the image in the 
homomorphic domain [9]. 

Hybrid techniques are widely used for watermarking. This 
can be accomplished by combining two or more 
techniques to achieve better results. 

One of these hybrid techniques is the DCT-DWT [14]. It 
is based on dividing the RGB color image into three 
matrices (red, green and blue). The DCT is implemented 
on each color. Watermark embedding is performed on the 
LL sub-band by utilizing the DWT to divide the DCT 
domain into four sub-bands for each color [14-16]. 
Another hybrid technique is the DWT- SVD [2]. It is based 
on utilizing the DWT to divide the image into four sub- 
bands. The LL sub-band is divided into three matrices (red, 
green and blue). Watermark embedding is performed by 
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applying the SVD on each color of the LL sub-band 
[2,6,9,18]. 

The main aim of this paper is presenting a hybrid 
technique to achieve robust watermarking based on [1]. 
The rest of this paper is organized as follows. Section 2 
gives a literature review. The evaluation metrices are 
shown in section 3. Section 4 discusses the proposed 
technique. The simulation results are illustrated in section 
5. Section 6 presents the conclusions followed by the more 
relevant references. 

2. Literature Review 

Surveying the literature there are three main techniques 
most commonly used in color image watermarking: 
Discrete Wavelet Transform (DWT) [3-5,9], Discrete 
Cosine Transform (DCT) [6,9,12,15] and Singular Value 
Decomposition (SVD) [8-10,17]. Hybrid techniques are 
widely used in watermarking. There are two main hybrid 
techniques most commonly used DCT-DWT [7,13-15] 
and DWT-SVD [2]. This paper is the extended paper from 
conference paper [1]. 

Discrete Wavelet Transform (DWT) 

Wavelet transform is an information processing method; it 
has been widely used in many fields including image 
processing. The DWT divide an image into four non- 
overlapping bands. These bands are calculated in different 
frequencies [9]. Figure 1 shows the four sub-bands; 
approximation sub -band (low frequency LL), horizontal 
sub-band (high frequency LH), vertical sub-band (high 
frequency HL), and diagonal sub band (high frequency 
HH). Figure 2 show the low pass and high pass analysis 
filter h[-m],g[-m] [3-5,9]. 


LL 

LH 

HL 

HH 


Figure 1. The DWT sub-bands. 



Figure 2. Two dimensional decomposition using the DWT. 


watermarking technique as watermark is embedded into 
one of these three bands, carrier signal pixel are not 
modified directly [12]. 

Two dimension discrete cosine transform 2D-DCT is 
defined as [7] 


M- 1 N - 1 

F(jk) = a(j)a(k ) II 

m-0 n-0 


COS 


[(2771 + l)y7l] 

[(271 + 1)/C7l] 

[ 2 M \ C ° S 

l 2N \ 


(1) 


Inverse transform 2D-IDCT is defined as [7] 

M-1N-1 

f(mn) = II a(j)a(k)F(jk) cos | 

i-0 y=0 


[(2771 + 1)771] 
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Where 0 < j < M-l, 0 < k < N-l, 

fvi' 7 = 0 

a(j) = \ n and a G) = 

(J-, l<j <M-1 



k = 0 


1 < k < N — 1 


Singular Value Decomposition (SVD) 

SVD is an effective numerical analysis tool used to 
analyze matrices. The SVD transformation matrix divides 
the matrix into three matrices with the same size of the 
original matrix. Then SVD of A is defined as: 

A=USV T (3) 

Where U and V are orthogonal matrices and S is a diagonal 
matrix. The entries in these diagonal matrices are called 
singular value of the matrix A. Applying this technique in 
RGB color images requires image color separation into 
three matrices red, green and blue. Implement the SVD 
watermarking in each matrix and finally reconnect these 
three matrices given the final watermarked color image 
[9]. For embedding a watermark into host image, the SVD 
is performed onto the host image as in (3). The watermark 


is added to the S matrix of the original matrix as: 

D=S+kW (4) 

SVD performed as a new matrix: 

D = U w S w Vw (5) 

The watermarked image is: 

A w =USwV T (6) 

For extract watermark the watermarked image is: 

a* w = u*s;v* T (7) 

The matrix include watermark is computed: 

d* = u w s^vX (8) 

It’s possible to corrupt watermark [6,8,9,17,18] 

W*=(D*-S)/k (9) 


The hybrid technique (DCT-DWT) 

The hybrid technique DCT-DWT is based on utilized the 
DWT to divide DCT domain into four sub-bands [14]. The 
RGB color image is divided into three matrices (red, green 
and blue). The DCT domain is extracted by applying the 
DCT for each color. Embedding watermark is done on 
sub-band LL by utilize DWT to divide the DCT domain 
into four sub-bands for each color [14-16]. 


Discrete Cosine Transform (DCT) 

Discrete Cosine Transform (DCT) is a standout amongst 
the most well known orthogonal change strategies utilized 
as a part of picture preparing. High vitality compaction 
property of DCT is the reason. In watermarking, this 
property helps in choosing the area in picture to insert the 
watermark with most robustness [6]. DCT divides aircraft 
carrier signal into three frequencies bands namely low, 
middle, and heights frequency bands. It is frequency orbit 


The hybrid technique (DWT-SVD) 

The hybrid technique DWT-SVD is based on utilized the 
DWT to divide image into four sub-bands and then 
applying SVD into diagonal sub-band (LL) [2]. The DWT 
is utilized to divide image into four sub-bands. The sub- 
band LL is divided into three matrices (red, green and 
blue). Embedding watermark is done by applying SVD for 
each color of sub-band LL [2,6,9,18]. 
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3. Evaluation Matrices 


All tests were performed using an Intel® core™i5 CPU 
M450 @2.4GHz with 6GB Memory and running 
Windows 7 64-bit operating system and using MATLAB 
8. All the images used are 512x512 colored JPEG images. 
There are five main tests to determine the performance of 
a watermarking technique. Visually test to determine the 
invisibility of watermark and any degradation in quality or 
colors in the watermarked image compared to original 
image. The Peak Signal-to-Noise Ratio (PSNR) of the 
watermarked image and the Normalization Correlation 
(NC) for the extracted watermark are calculated. Applying 
attacks on the watermarked image then extracting the 
watermark and repeating the NC tests again after attacks. 
Calculate the CPU time for embedding algorithm. 

PSNR can be calculated by [9] 

M— 1,N— 1 

M5£ = M^N E (Aw(x,y)-A(x,y)) 2 (10) 

x=0,y=0 

255 2 

PSNR (DB) = 101o glo — — — (11) 


where A is original image, A w is watermarked image and 
M, N size of original and watermarked image. NC 
calculate given by [9] 


w\w 

\\W'\\.\\w\\ 


( 12 ) 


where W is original watermark and W* is extract 
watermark. 


4. The Proposed Watermarking Technique 

The proposed technique presented here is DWT- 
homomorphic based SVD. The DWT is utilized to divide 
image into four non-overlapping bands. The reflectance 
components of the sub-band LL is extracted using 
homomorphic transform for each color (red, green and 
blue). Embedding watermark is done by applying the SVD 
on reflectance components of the sub-band LL. The sub- 
band (LL) intensity can be represented as: 

f_LL(ni,n 2 )= i(ni,n 2 )r(ni,ri 2 ) (13) 

where i(ni,n 2 ) is the light illumination and r(ni,n 2 ) is the 
reflectance of the object to be sub-band imaged. The 
homomorphic transform is performed. 

In [ f_l_l_(ni,n 2 )]= ln[i(ni,n 2 )]+ln[r(ni,n 2 )] (14) 

The LPF and the HPF are applied to log [f_LL (ni, n 2 )] to 
separate the illumination component I and the reflectance 
component R, respectively, for each pixel value in the 
form of matrices. The SVD is performed on the (R matrix). 

R=USV t (15) 

The watermark (W matrix) is added to the SVs of the 
reflectance matrix. 

D=S+kW (16) 

The SVD is performed on the new modified matrix (D 
matrix). 

D=U w SwV T w (17) 

The sub-image (R w matrix) is obtained by using the 
modified matrix (S w matrix) 

Rw=US w V T (18) 

An inverse homomorphic transform is performed on I and 
R w to obtain a matrix X w 


Xw=Rw+l (19) 

The sub-band (LL) of watermarked image can obtained as 
F_LL W =exp (X w ) (20) 

So watermarked image can obtained by reconnect colors 
(Red, Green, Blue) then applying the inverse DWT to 
produce watermarked image F w . 


To extract the possibly corrupted watermark from the 
possibly distorted watermarked image, given U, S w , V 
matrices and the possibly distorted sub-image F w , the 
above steps are reversed as follows: 

The homomorphic transform is performed on the sub-band 
(LL) for watermarked image f_LL w 
The HPF is used to get the possibly corrupted reflectance 
component R* w . 

The SVD is performed on the R* w matrix. 

R*w=U*S*wV* t (21) 

The matrix that includes the watermark is computed. 

D*=U w S*wV T w (22) 


The possibly corrupted watermark is obtained. 


W*= (D*-S)/k (23) 

The proposed technique's embedding algorithm is shown 
in figure 3 and extracting algorithm is shown in figure 4. 
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Figure 3. The embedding algorithm. 
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5. Simulation Results 

There are two experiments for each technique one using 
RGB color JPEG images with size 512x512, resolution 
72x72 dpi and bit depth 24 host image pepper and 
watermark fruits, and the other test using RGB color JPEG 
images with size 512x512, resolution 180x180 dpi and bit 
depth 24 host image Khalid and watermark Rokayya as 
shown in figure 5 where (a) host image pepper, (b) 
watermark image fruits, (c) host image Khalid and (d) 
watermark image Rokayya. Visualization test results for 
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proposed technique compared with the other state of art 
techniques that stated in literature review without attacks 
are shown in figure 6. Figure 7 shows the rotate 30° attacks. 
Rotate 45° attacks are shown in figure 8. Figure 9 shows 
the rotate 60° attacks. Gaussian noise attacks with variance 
parameter 0.01 are shown in figure 10. Figure 11 shows 
the Gaussian noise attacks with variance parameter 0.05. 
Gaussian noise attacks with variance parameter 0.1 are 
shown in figure 12. Figure 13 shows the resize to 256x256 
attacks then resize to 512x512. Motion blur attacks are 
shown in figure 14. Figure 15 shows the disk blur attacks. 
Average blur attacks are shown in figure 16, Figure 17 
shows the JPEG compression 20% attacks. JPEG 
compression 40% attacks are shown in figure 18. Figure 
19 shows the JPEG compression 60% attacks. Crop 
attacks are shown in figure 20. Table 1 and table 2 shows 
the evaluation matrices results (PSNR and NC) without 
attacks and algorithm CPU time for the proposed 
technique compared with the other state of art techniques. 
The NC for extracted watermark after attacks for the 
proposed technique compared with the other state of art 
techniques shown in figure 21 and figure 22. 



Figure 5. (a) host image pepper, (b) watermark fruits, (c) host image 
Khalid, (d) watermark Rokaya. 



Figure 7. Visualization tests after rotate 30° attacks. 



Figure 6. Visualization tests without any attacks. 




£ 


Figure 8. Visualization tests after rotate 45° attacks. 
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Figure 9. Visualization tests after rotate 60° attacks. 
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Figure 11. Visualization tests after Gaussian noise attacks with variance 
parameter 0.05. 



Figure 10. Visualization tests after Gaussian noise attacks with variance 
parameter 0.01. 



Figure 12. Visualization tests after Gaussian noise attacks with variance 
parameter 0.1. 
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ire 13. Visualization tests after resize to 256x256 then resize to 
512x512 attacks. 
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Figure 15. Visualization tests after disk blur attacks. 
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Figure 14. Visualization tests after motion blur attacks. 
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Figure 17. Visualization tests after JPEG compression 20% attacks. Figure 19. Visualization tests after JPEG compression 60% attacks. 



Figure 


18. Visualization tests after JPEG compression 40% attacks. 


Figure 20. Visualization tests after crop attacks. 
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Table 1. The proposed watermarking technique compared with 
other state of arts techniques based on PSNR for watermarked 
image, NC for extracted watermark without attacks and CPU time 
for embedding algorithm for pepper and fruits images. 



DWT 

DCT 

SVD 

DCT- 

DWT 

DWT-SVD 

Proposed 

PSNR 

red 

24.716 

24.727 

25.615 

28.969 

25.620 

35.856 

PSNR 

green 

25.826 

25.838 

26.176 

29.284 

26.187 

38.479 

PSNR 

blue 

27.676 

27.688 

28.222 

31.376 

28.229 

34.896 

NC red 

0.7316 

0.7107 

0.9115 

0.5896 

0.9124 

0.9963 

NC 

green 

0.9673 

0.9364 

0.9924 

0.9785 

0.9943 

0.9980 

NC blue 

0.9145 

0.8861 

0.9594 

0.9541 

0.9634 

0.9945 

CPU 

time 

1.4040 

0.8424 

3.8844 

1.0140 

2.5584 

3.1044 


Table 2. The proposed watermarking technique compared 
with other state of arts techniques based on PSNR for 
watermarked image, NC for extracted watermark without 
attacks and CPU time for embedding algorithm for Khalid and 
Rokayya images. 



DWT 

DCT 

SVD 

DCT- DWT 

DWT-SVD 

Proposed 

PSNR 

red 

22.902 

22.919 

22.944 

28.234 

22.939 

33.301 

PSNR 

green 

23.322 

23.344 

23.388 

29.011 

23.391 

33.793 

PSNR 

blue 

23.304 

23.331 

23.324 

28.618 

23.325 

32.737 

NC red 

0.8876 

0.7207 

0.9955 

0.8942 

0.9954 

0.9976 

NC 

green 

0.9394 

0.7870 

0.9980 

0.9267 

0.9980 

0.9987 

NC blue 

0.9025 

0.7742 

0.9978 

0.9219 

0.9978 

0.9985 

CPU 

time 

1.5288 

0.7956 

3.8064 

1.0296 

2.8236 

3.1200 


As shown from visualization test and experimental results, 
by visual inspection, the results reveal that no noticeable 
variations exist between the watermarked and the original 
images which enforce the fidelity of the proposed 
watermarking technique. The watermarked images and the 
extracted watermarks for the proposed watermarking 
technique have a high PSNR, NC respectively compared 
with the other state of art techniques in the three color 
channels. The embedding algorithm CPU time for the 
proposed technique is higher with few seconds than the 
other hybrid techniques but it is little speed than the SVD 
techniques. So the CPU time for the proposed technique is 
in range of the other state of art techniques. 



(a) NC for extracted watermark images for red channels after attacks 
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(b) NC for extracted watermark images for green cha nn els after attacks 



(c) NC for extracted watermark images for blue channels after 
attacks 

Figure 21. The proposed watermarking technique compared with 
other state of arts techniques for pepper and fruits images based on 
NC for extracted watermark after attacks 



(a) NC for extracted watermark images for red cha nn els after 
attacks 



(b) NC for extracted watermark images for green channels after attacks 



(c) NC for extracted watermark images for blue channels after attacks 
Figure 22. The proposed watermarking technique compared with 
other state of arts techniques for Khalid and Rokayya images based 
on PSNR for watermarked image, NC for extracted watermark after 
attacks 

As shown from visualization test, experimental results and 
evaluation figures after attacks, results illustrate that the 
extracted watermarks for the proposed watermarking 
technique have a high NC compared with the other state of 
art techniques in the three color channels, which 
guarantees watermarking existence. The NC for the 
extracted watermarks for the proposed technique is higher 
compared with the other state of art techniques in the three 
color channels. It means that the proposed technique is 
more robust against attacks. 
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6. Conclusions 

This paper proposes an image watermarking hybrid 
technique using the DWT-homomorphic based SVD 
technique. It embeds the watermark during applying the 
SVD on the reflectance components of the sub-band LL 
by utilizing the DWT to divide the image into non- 
overlapping bands. The reflectance components of the 
sub-band LL is extracted using the homomorphic 
transform for each color (red, green and blue). It is shown 
that we can embed an RGB color image watermark to an 
RGB color host image with the same size. The 
experimental results demonstrated that the DWT- 
homomorphic based SVD watermarking has a high 
fidelity and robustness in the presence of different types of 
attacks. There is always a large probability for watermark 
detection. The embedding algorithm CPU time is in range 
of the other hybrid techniques. The achieved results also 
reveal the superiority of the proposed DWT-homomorphic 
based SVD watermarking technique over other state of art 
watermarking techniques. 
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Abstract 

This paper proposes an integrated method for efficient 
content based image retrieval using color, shape and 
rotational invariant texture features. The present paper 
derived rotational invariant features on each region. To 
derive shape features textons are computed. To represent 
texture features gray level co-occurrence matrix 
(GLCM) features are derived on region based rotational 
invariant texton matrix. These features are combined 
with HSV histograms. The advantage of region based 
models is they are more applicable when working with 
images of large size and especially in real time 
environment. The image retrieval is performed on five 
categories of Wang database and the present method is 
compared with texton co-occurrence matrix (TCM), 
color correlogram gradient (CCG) and GLCM methods. 

Keywords: GLCM , HSV; shape; texture; rotation 
invariant features; 

1. Introduction 

These days there is a huge expansion on browsing of the 
digital libraries or databases. Searching and retrieving 
images from these libraries has become a crucial and 
tedious task for human annotation and this has created 
the dire need of content based image retrieval (CBIR) 
methods. The CBIR methods are capable of retrieving 
the desired images from these libraries based on the 
image contents. The CBIR models makes use of visual 
contents of an image like color, shape, texture mosaic, 
faces and spatial layouts for efficient image retrieval 
(IR). It is highly impossible to represent an image with a 
single best feature and it is due to the fact that user may 
capture photographs from different angles, lighting 
conditions, reflection etc. The traditional image retrieval 
(IR) methods are text based methods. The images are 
retrieved by matching the corresponding index text or 
meta-data associated with images. A comprehensive 
literature survey on CBIR is presented in [1-4]. 

The color content of an image is one of the powerful 
descriptor of CBIR and it can keep semantically intact 


and it is robust to noise, change in size, image 
degradation and orientation. There are various CBIR 
systems that are based on color descriptors [5, 6, 7, 8]. 
The retrieval performance of these degrades on huge 
databases due to color shading problems. One of the 
most visual characteristic feature of the image is the 
texture and texture features plays an important and 
crucial role in many applications like image 
classification [ 9, 10, 11], face recognition [12, 13], 
smoke detection [14], age and facial expressions 
identification [15, 16], pedestrian detection [17, 18] and 
image retrieval[ 19, 20, 21, 22, 23]. Various methods are 
proposed for extracting texture features such as co- 
occurrence matrices [24], local binary patterns [25, 26], 
textons [27] and pattern based methods [28, 29]. These 
methods can be roughly classified into statistical, 
structural and model based method. Most of the pattern 
based methods attempted to retrieve the desired images 
based on the frequencies of each pattern in the image and 
treated them as feature descriptor using histograms. The 
frequency gives information regarding the number of 
times these patterns appeared in the image and it doesn’t 
not reveal any information regarding the mutual 
occurrence of patterns in the image. This is addressed by 
the present paper by making use of textons. 

The IR based on texture descriptors such as Gabor 
transforms [30], rotated wavelet filters [31] are proposed 
in the literature. The other CBIR models are based on 
relevance feedback techniques [32], robust local patterns 
[33], temporal patterns of video sequences [34] and the 
combination of relevance feedback with region based 
features [35]. Recently various pattern based features i.e. 
local maximum edge patterns [36], local tetra patterns 
[37] for natural IR are proposed. The pattern based 
features are also proposed for retrieving of medical 
images i.e. directional binary wavelet pattern [38], local 
mesh patterns [39] and local ternary co-occurrence 
patterns [40]. The block based methods using LBP 
texture descriptors are proposed by Takalo et al. [41] for 
CBIR. The present paper divides the image into multi 
regions and evaluates the features on each region. This 
provides the detailed relative location similarity and 
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reduces the computational complexity. The earlier works 
on CBIR treated the texture and color information as 
individual features. In this work region based rotational 
invariant texture features are integrated with shape and 
color space components for efficient image retrieval. 
The present paper is organized as follows. The second 
section describes the concepts of basic LBP and 
generation of rotational invariant uniform LBP. The 
section three describes the methodology and frame work. 
The section four and five gives the results and 
discussions and conclusions. 

2. Local binary pattern (LBP) 

Ojala et al. [42] introduced a powerful local gray scale 
descriptor called LBP for texture classification. LBP 
utilizes the intensity distribution of local neighborhood 
pixels. The LBP code on a neighborhood is computed by 
comparing the greyscale value of neighboring pixels (g p ) 
with central pixels (g c ) as shown in the Figure 1, based 
on the following equations. 


The LBPg,i operator produces 2 8 different binary patterns 
and this results a total of 256 LBP codes or feature vector 
of length 256. When the image is rotated, the gray level 
values of Pi will correspondingly move along the 
perimeter of the circle around, the central pixel P c . The 
pixel Pi of the neighborhood is mostly assigned the co- 
ordinate position (0, 0) as shown in Figure 2. Rotating a 
particular binary pattern on the perimeter naturally 
results different LBPg codes. This does not apply to the 
constant binary pattern i.e. contains all zeros or all ones 
(00000000 or 1111111 l).To overcome this rotation 
effect and to make the local binary pattern as rotation 
invariant a unique identifier is denoted by obtaining the 
minimum or maximum value by rotating as given in 
equation 3 and 4. 


LBPo 


0 , 1 ... 7 } 


( 3 ) 


LBP ? 


LBP , 


P,R 


s (^Bp 9c) 


>U) = { 


( 1 ) 

( 2 ) 


1 if x >0 
0 otherwise 

Where P is the number of neighboring pixels and R is the 
radius of the neighborhood. A 3x3 neighborhood will 
have P=8 and R=l. The co-ordinates of the 
neighborhood pixels are computed as (RCos(27cP/P, - 
RSin(27iP/P) and their grey levels are estimated by 
interpolation. 


= min{ROR(LBP 8 ,i ) | i 
or 

= max{ROR(LBP 8 , Q\i = 0,1 ... 7} (4) 

Where ROR(z,i) performs a circular bitwise right shift 
on the 8 -bit binary number z, i times. The min(x) or 
max(x) takes out the minimum or maximum LBP code 
from these 8- circular shifts. This becomes the rotation 
invariant LBP (LBP n ). 
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Figure 2: The basic co-ordinate system of a LBP window. 


Table 1 : ULBP n values and indexes on LBP 8 , r. 
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(b)Binary 
pattern 
based on 
eq.2 

Figure 1: LBP code generation. 
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2.1 Derivation of Rotational Invariant ULBP 
(ULBP ri ) 

LBP with P neighboring pixels results into 2 P 
combinations of LBPs. This results a feature vector 
length of 2 P . As the number of neighboring pixels 
increases (16, 1) and (16, 2) the length of feature vector 
increases drastically. The disadvantage of this feature 
vector is its computational cost. To overcome this 
uniform LBP (ULBP) [43, 44] are proposed. The ULBPs 
have limited discontinues i.e. less than or equal to two in 
the circular binary representation and it is proved that 
most of the windows (above 90%) in human faces and 
textures are ULBPs. The remaining patterns where the 
numbers of transitions from 0 to 1 or 1 to 0 are above 
two are considered as non-ULBPs (NULBP). The 
NULBPS are treated as miscellaneous. There will be 
P*(P-1) +3 distinct ULBP on a neighborhood with P 
neighboring pixels. 


Rotational 
invariant 
ULBP on a 3 
x 3 window 
(adjacent Is) 

LBP code 
Value 

according to 
equation 3 

Index 

value 

assigned to 
ULBP ri 

(0000 0001) 

1 

1 

(00000011) 

3 

2 

(00000111) 

7 

3 

(00001111) 

15 

4 

(00011111) 

31 

5 

(00111111) 

63 

6 

(01111111) 

127 

7 

(11111111) 

255 

8 

(00000000) 

0 

9 

All others- 
NULBPS 


0 


There are 36 unique rotation invariant LBPs that occur 
on a 3x3 neighborhood or LBPg,R. It is experimentally 
shown that LBP 8 ^ 6 does not show any good 
discrimination [44]. The performance of these 36 
patterns in discrimination of textures varies greatly 
because some patterns sustain rotation quite well while 
other patterns do not and confuse the analysis. 
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Figure 3: The integrated CBIR model of the present paper 


The varying performance of these LBP n also led to the 
discovery of uniform (U) patterns. A ULBP appears on 
a LBPg,R , whenever there are zero or more (<= 8) 
adjacent ones in any position and the Table 1 summarizes 
the index values that are assigned to ULBP n by the 
present paper. 

3. Methodology 

The present paper proposes a novel frame work for CBIR 
called “multi-region rotational invariant uniform LBP 
texton matrix” (MR-ULBP n -TM) to overcome the 
limitations of LBP, and to capture shape information on 
multi regions. The basic image retrieval model of this 
paper is given in Figure 3. 

The basic LBP operator has the following disadvantages. 
It is designed for a small spatial support area (3x3 
neighborhood); therefore the bit-wise comparison 
between two single pixel values of this neighborhood is 
affected by noise to a great extent. The features 
computed on the basic LBP cannot capture larger scale 
structure (macrostructure) that may have dominant 
features of textures. In this paper the computation on sub 
regions is performed based on average values of sub 
regions, instead of individual pixels. 

3.1. Computation of MR-ULBP ri 

The present paper converts the color image in to HSV 
color space and derives color histograms. The V color 
space of the image is divided into non over-lapped 
regions of size 9x9. Each region is sub divided into nine 


non overlapped sub-regions. The present multi region 
(MR) IR model derives a single value for each 
rectangular sub region. The advantage of the present 
method is it reduces the overall dimension space of the 
derived features. The MR model captures the dominant 
features on a large scale rectangular structure and the sub 
region features are estimated on grey level values of a 
local neighborhood. The steps for computation of MR- 
ULBP n are given below. 

Step one: Replace the each sub region by its average grey 
level value. By this the region of size 9x9 with 9 sub 
regions becomes a 3 x 3 neighborhood, where each pixel 
value represents the average grey level value of that sub 
region. 

Step two: Computation of LBP on each region by 
average operator. The comparison operator between 
single pixels in LBP is simply replaced with comparison 
between average gray-values of sub-regions (threshold). 
This generates a binary pattern. 

Step Three: If the generated multi region-Local binary 
pattern of step two is ULBP then replace the central pixel 
with MR-ULBP n index value as given in table 1. 
Otherwise replace the central pixel with value zero 
(NULBP). 

Note that the scalar values of averages over blocks can 
be computed very efficiently [45] from the summed- area 
table [46] or integral image [47]. For this reason, MR- 
ULBP n feature extraction can also be very fast: it only 
incurs a little more cost than the original 3x3 LBP 
operator. This way, MR-ULBP n code presents several 
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advantages: (1) It is rotational invariant and robust; (2) it 
encodes not only micro structures but also 
macrostructures of image patterns, and hence provides a 
more complete image representation than the basic LBP 
operator; (3) MR-ULBP n can be computed very 
efficiently using integral images. 4) This representation 
is very useful in deriving textons because the image is 
quantized to ten levels (0 to 9); the ULBP n will be given 
indexes from lto 9 and all NULBPs as zero. 

The regions can be small, medium and large i.e. 3 x 3,9 
x 9 and 15x15 neighborhoods respectively. For a small 
scale regions like basic LBP, local, micro patterns of 
textures are well represented, which may beneficial for 
discriminating local details. On the other hand, using 
average values over the large scale regions (15 x 15) 
reduce noise, and makes the representation more robust; 
and large scale information provides complementary 
information to small scale details and much 
discriminative information is also dropped. Normally, 
regions of various scales should be carefully selected and 
then fused to achieve better performance. The present 
paper chose a region of size 9x9 and sub regions of size 
3x3. 


neighboring pixels on a 2 x 2 windows or grid. The 
pixels of the grid are denoted as P, Q, R and S. The five 
types of textons are denoted as Ai, A 2 , A 3 , A 4 and A 5 
(Figure 4). 


p 

Q 

R 

s 


(a) Ai A 2 A 3 A 4 


A 5 

Figure 4: The textons used in this paper (a) 2x2 window of the image 
Aj to A 5 : different textons . 
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3.2. Computation of “Texton Matrix on Multi Region 
Rotational Invariant Uniform LBP (MR-ULBP ri - 
TM)” 

The previous section generates a multi-region based 
ULBP n (MR-ULBP n ) image with ten quantized levels 
or patterns {0 to 9}. The present section evaluates 
textons on this. The LBP and texton based models are 
widely used in many applications [48, 49, 50, 51]. It is 
found that, it is very difficult to obtain satisfactory 
results, of image processing, by designing algorithms 
that process the images based on pixel levels. More over 
this processing system fail in representing the shape 
component totally. To address this Julesz [27] proposed 
the concept of texton’ s. Textons represent the 
relationship between pixels in the form of shape 
component; however defining a texton is still a difficult 
task. Texton is one of the popular and significant shape 
primitives and is defined with certain placement rule. 
The textons represents the emergent and dominant 
patterns on a local neighborhood. 

The image features have a close relationship with textons 
and color diversification. The difference textons may 
form various image features. If the textons in image are 
small and the tonal differences between neighboring 
textons are large, a fine texture may result. If the textons 
are higher and holds quite a few pixels then it results a 
coarse texture and it also depends on scale [49]. In the 
image if the textons are large and contains a small 
number of texton categories, then a shape may result. 
There can be numerous types of textons in image. In this 
paper, we only classify and make use of five special 
types of textons that holds all the three or four 
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Figure 5: computation of MR-ULBP n -TM from MR-ULBP n image. 
(a):MR-ULBP n image ; (b):Detection of textons A h A 2 and A 3 on MR- 
ULBP ri image (c)Computadon of textons A 4 and A 5 on MR-ULBP ri 
image; (d)Formation of MR-ULBP n -TM (Final Texton image) using 
Ai, A 2 , A 3 , A 4 and A 5 . 

The process of texton identification is shown in Figure 
5. The present paper used the five types of textons to 
detect every grid. A particular texton detection process 
is performed on a 2 x 2 grid in overlapped manner 
(shifting right by one column position then row by one 
position down) and if the texton is detected the pixels of 
texton are kept with original values and others are 
replaced with zeros. The same process is repeated for all 
five defined categories of textons. The MR-ULBP n -TM 
(final texton) image (Figure 5(d) will be formed by 
combining these five types of texton images (Figure 5(b) 
&5 (c). 

3.3 COMPUTATION OF GLCM FEATURES ON 
MR-ULBP ri -TM 

On MR-ULBP n -TM image, the co-occurrence matrix is 
formed with a distance D and with an angle 0 0 ,45°,90 0 
and 135°. The GLCM features i.e. entropy, energy, 
contrast, local homogeneity and correlation (equations 
5. 6, 7, 8 and 9) are computed on MR-ULBP n -TM with 
0 0 ,45°,90 0 and 135° orientations and average feature 
values of these orientation are listed in the feature 
library. In order to extract color information the present 
paper also quantized the original image using HSV color 
space. 

Entropy = S!j=o - ln ( p ij) p ij (5) 


Energy = Zf]=o -ln( p ij) 2 

(6) 

Contrast= X[j=o Py 0 — j) 2 

(7) 

Local Homogenity- £lj=o 1+(i V )2 

(8) 

Correlation = Sfj-o p ij 

(9) 


where Pij is the pixel value in position (i,j) of the texture 
image, N is the number of gray levels in the image, p is 
fi — Sy=o iPij mean of the texture image and cr 2 is 
a 2 = TiiJ=o Pij (i — m) 2 variance of the texture image. 

3.4 Image Retrieval Algorithm 

The proposed image retrieval algorithm is given below 
Input: Query image Output; Retrieval of similar images 

1 . Convert the RGB image into HSV color space. 

2. Divide the v-color space image into non 
overlapped regions of size 9x9. 

3. Divide the region in to sub regions and derive 
feature vector (The region of sixe 9x9 becomes 
3x3). 

4. Derive multi region rotational invariant ULBP 
(MR-ULBP n ) index (as given in table 1) image. 

5. Compute texton matrix on multi-region rotational 
invariant ULBP (MR-ULBP n -TM) by deriving 
textons on each 2x2 grid of step 4. 

6. Derive multi-region rotational invariant ULBP 
texton co-occurrence matrix (MR-ULBP n -TCM) 
with various distances on step 5. 

7. Compute GLCM features on MR-ULBP ri -TCM. 

8. Compute the histograms for H, S and V color 
spaces. 

9. Construct feature vector by concatenating 
histograms for H, S and V color spaces with MR- 
ULBP n -TCM features. 

10. Compare the features of query image with the 
images in the database using similarity 
measurement. 

1 1 . Retrieve the images based on nearest distance or 
best matches. 


3.5 Query Matching 

This is accomplished by measuring the distance between 
the query image and database images. The present paper 
used Euclidean distance as the distance measure and as 
given below 

1 /2 

DiSt s (T n ,/ n ) = (Z U =l\fi(T n ) - fjQn) I 2 ) (10) 


Where T n query image, I n image in database; 

The database image is used as the query image in our 
experiments. If the retrieved image belongs to the same 
category as that of query image we say that the system 
has suitably identified the predictable image otherwise 
the system fail to find the image. 
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4. Results and Discussion 

In order to efficiently investigate the performance of the 
present retrieval model, we have considered the Wang 
database [52]. Wang is a subset of Corel stock photo 
database. In the Wang database the images have been 
manually chosen. This data base consists of 5 classes of 
images i.e. Elephants, Fancy Flowers, Horses, Valleys 
and Evening Skies and 100 images per each class. The 
present paper used these 5 classes of images for 
relevance assessment. For a query image the relevant 
images are assumed to be the remaining 99 images of the 
same class. The images from all other classes are treated 
as irrelevant images. The hefty size of each class and the 
heterogeneous image class contents made Wang data 
base as one of the popular database for image retrieval. 
The performance of the present model is evaluated in 
terms of precision and recall rate. Precision is the ratio 
of number of retrieved images (Inr), Vs. the number of 
relevant images retrieved (Irr). The recall is the ratio of 
total number of relevant images in the database (Itr) Vs. 
Irr. 

Precision - P= ( Irr / I N r ) (11) 

Recall -R = ( Irr / Itr) (12) 

The present paper compute GFCM features on MR- 
ULBP n -TCM using various distance values: D= 1,2... 
7 and query matching is performed using Euclidean 
distance. The present retrieval model selects 16 top 
images from the database images that are matching with 
query image. And also experimented with more number 
of top images and retrieval performance is measured. 
Figure 6 shows five examples of retrieval images, i.e. 
one image from each class, by the proposed method with 
D=4 for Inr =16 and top left most image is the query 
image. 
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Figure 6 (b): Retrieved fancy flower images. 
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Figure 6(c): Retrieved horse images. 
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6 (a): Retrieved elephant images. 


Figure 6 (d): Retrieved valley images. 
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Figure 6 (e): Retrieved Evening Skies images. 

Fig 6(a) to 6(e) Retrieved images for each class with D=4 
for Inr =16 on proposed integrated method. 

The average precision and recall rates of all classes of 
images are computed based on MR-ULBP n -TCM 
features and color histograms and listed in Tables 2 and 
3. The best performance of MR-ULBP n -TCM with color 
histograms was obtained when D = 4. The retrieval 
performance of the integrated MR-ULBP n -TCM is 
compared with GLCM [53], color correlogram [54] and 
texton Co-occurrence matrix [49]. The present paper 
selected 60 images of the same category or class as query 
images (one by one) and computed precession and recall 
rates by selecting top 16, 25, 35, 45, 55,65,75,85 and 95 
images. The average precession rates of GLCM, CCG 
and TCM are ranging from 38% to 45%, 39% to 46% 
and 60% to 64% respectively for D=4 and for number of 
images retrieved Inr=16 (Table 2 & 3). The average 
precession and recall rates are plotted in graphs ( Figure 
7 and 8 ) by varying Inr. The present paper also computed 
image retrieval accuracy as defined below. 

IR accuracy A = ((precession + recall) 12) (11) 


Table 2: Average precision rate of all classes of images with various 
distance measures for I NR -16. 



Distance parameter 

Methods 

D=1 

D=2 

D=3 

D=4 

D=5 

D=6 

D=7 

GLCM 

0.38 

0.41 

0.42 

0.45 

0.44 

0.43 

0.43 

CCG 

0.39 

0.41 

0.44 

0.46 

0.45 

0.44 

0.43 

TCM 

0.60 

0.61 

0.63 

0.64 

0.63 

0.61 

0.62 

Proposed 

MR- 

ULBP ri - 

TCM 

0.69 

0.71 

0.74 

0.76 

0.75 

0.72 

0.71 


The average IR accuracy graph with varying number of 
matches considered (Inr ) is plotted (Figure 9). The 
proposed integrated MR-ULBP n -TCM achieved best 
performance when compared to the existing three 
methods. 


Table 3: Average precession rate on each class of images for D=4 for 
Inr -16. 



Image category and the precision (%) 

Methods 

Eleph 

ants 

Fane 

y 

Flo 

wers 

Horse 

s 

Valle 

ys 

Eveni 

ng 

Skies 

Averag 

e 

GLCM 

0.39 

0.42 

0.44 

0.48 

0.5 

0.45 

CCG 

0.4 

0.43 

0.46 

0.49 

0.52 

0.46 

TCM 

0.61 

0.6 

0.66 

0.67 

0.7 

0.64 

Propose 

d 

MR- 

ULBP ri - 

TCM 

0.71 

0.72 

0.76 

0.81 

0.82 

0.76 



Figure 7: Average Performance curve (precision) using GLCM, 
CCG, TCM and MR-ULBP ri -TCM method with D=4. 



Figure 8.: Average Performance curve (recall) using GLCM, CCG, 
TCM and MR-ULBP ri -TCM method with D=4. 
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Figure 9: Average Performance curve (accuracy) using GLCM, 
CCM, TCM andMR-ULBP ri -TCM method with D=4. 


5. Conclusions 

The proposed CBIR model integrated the features from 
texture, shape and color. The present paper derived a 
region based model and evaluated rotational invariant 
features in the form of ULBP n . The proposed model is 
robust and averages can be computed efficiently using 
integral images. The small feature set of multi region can 
make the overall process to be simple and suitable when 
dealing with large size images especially in real time 
environment. The rotational invariant ULBP indexing 
quantizes the image in to 10 levels and these are useful 
in computing texton matrix. The GLCM features derived 
on MR-ULBP n -TCM along with color histograms 
outperformed the earlier methods of image retrieval. The 
proposed method is carried out with varying distances 
and number of retrieved images. The proposed method 
shown high results of retrieval for D=4. 
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Abstract 

Spectral- Spatial classification of hyperspectral image 
suffers from two problems: the existence of various 
feature extraction methods that complicate the choice of 
those applying and the availability of limited number of 
labeled training samples. To overcome these difficulties, 
this paper presents new spectral-spatial classification 
approach for remotely sensed hyperspectral image which 
integrates different spectral and spatial features via multi- 
feature kernels and process accurately with limited 
number of training samples. In fact, the proposed method 
introduces different methods to extract the spectral and 
the spatial features and exploits the oversampling based 
on interpolation techniques to generate new labeled 
samples. First, each pixel must be characterized by two 
spectral vectors computed according to the application of 
the principal components analysis, the independent 
components analysis and three spatial features calculated 
by using three methods: the average of neighbourhood 
pixels, the textural features and the extended multi- 
attribute profiles. Then an oversampling step is 
introduced to create new labeled samples used to train the 
classifier. Finally, a support vector machine (SVM) with 
multi-feature kernel is efficiently trained to generate the 
classification map. The proposed classification approach 
is experimentally evaluated using the AVIRIS Indian 
Pines data set, exhibiting higher performance when 
compared with the multi-feature classification without 
oversampling. 

Keywords: Hyperspectral images, SVM, multi-feature 
kernels, interpolation techniques. 

Nomenclature: 

AA Average Accuracy 

AP Attribute Profile 

ASM Angular Second Moment 

EAP Extended Attribute Profile 


ENT Entropy 

EMAP Extended Multi-Attribute Profile 

GLCM Gray Level Co-Occurrence Matrix 

ICA Independent Components Analyses 

k Kappa coefficient 

LM Local Mean 

MLR Multinomial Logistic Regression 

MP Morphological Profiles 

OA Overall Accuracy 

PCA Principal Components Analysis 

SVM Support Vector Machine 

Var Variance 


1. Introduction 

Recent advances on remote sensing provide images with 
high spectral and spatial resolution. Hyperspectral image 
presents the captured scene in hundreds of narrow 
contiguous bands spanning the visible-to-infrared 
spectrum. For that, hyperspectral data are used in a 
diverse applications such as agriculture [1], astronomy 
[2], surveillance [3] and environmental sciences [4]. 
These applications are based on the classification of each 
pixel in hyperspectral imagery. The objective of the 
classification is to assign each pixel to one of the classes, 
based on its spectral and spatial characteristics. Then, the 
exploitation of the highly informative spectral and spatial 
information of hyperspectral image pixels improves the 
accuracy of the classification. Nevertheless, the 
complexity and the high dimensionality of hyperspectral 
data complicate the classification, thus this technique is a 
challenging task. 

In the last decades, many discriminative classification 
approaches have been developed. Among these, the 
SVM [5] and MLR [6]— [4] have demonstrated to be very 
powerful. In particular, SVM has shown good 
performances for classifying high- dimensional data [7]. 
For that various spectral-spatial classification methods 
based on SVM have been presented in literature. 
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Composite kernels [8], which combine spectral and 
spatial kernels have been used to assure an accurate 
classification. The uses of stocked vector that concatenate 
spectral and contextual information extracted by MP has 
shown a significant improvements [9]. Edge-preserving 
filters have been exploited to develop an accurate 
spectral-spatial classification outperforming the 
classification without filtering [10]. A multi-feature 
model aiming at constructing a SVM set combining 
multiple spectral and spatial features [11] has registered 
an accurate classification. A generalized composite 
kernels have been introduced to improve the performance 
of the classification [12]. Segmentation techniques have 
been investigated in [13]. A multi- feature kernels [14], 
which combine different type of kernels: kernel for each 
feature, have noted a relevant classification. 

The literature review showed that all the studies 
emphasized the importance of the spatial information in 
the hyperspectral image classification. However, the 
availability of various spectral and spatial 
characterization methods (e.g. PCA [15], ICA [16], 
morphological features [17], wavelet-based texture [18], 
GLCM [19]) complicate the selection of the used 
methods. 

Another difficulty has been discussed in the literature, the 
Hughes phenomenon referred to the high dimensionality 
of the hyperspectral data and the availability of a limited 
number of training samples. Then different solutions 
have been proposed to solve this problem. Among these 
we note: the application of features selection [20] and 
extraction methods to reduce the dimensionality of data 
and the uses of semi- supervised learning techniques to 
develop a semi-supervised classification approach based 
on the augmentation of the number of training samples 
from the set of unlabeled pixels. Synthetic data has been 
investigated in [21] to increase the set of labeled samples 
by oversampling, which generate new samples by means 
of interpolation techniques. 

In this context, we propose a multi-feature spectral- 
spatial classification approach based on oversampling 
aims to solve the problem of the limited number of 
training samples and to overcome the difficulty of the 
choice of the adopted characterization methods. The 
proposed approach implements the following three main 
steps: 1) spectral and spatial characterization step that 
introduces different methods to extract the spectral and 
the spatial features, 2) oversampling which exploits 
interpolation techniques to create new labeled examples 
and 3) classification step based on the use of SVM with 
multi-feature kernels that combine different type of 
kernels: kernel for each feature. 

The remainder of this paper is organized as follows. 
Section 2 describes the proposed approach. Section 3 
reports classification results based on real hyperspectral 
data sets. Finally, Section 4 concludes with some remarks. 

2. Proposed approach 

The goal of the proposed approach is to have an accurate 
SVM classification dealing with these two problems: 

- the existence of many features extraction methods and 
the difficulty of the choice of the suitable method, 

- the limited number of training samples. 


For that, we propose to apply an oversampling step 
presented in [21] to increase the number of labeled pixels 
and to use the multi-feature kernels to combine different 
attributes resulted from the application of various 
spectral and spatial feature extraction techniques. 

The proposed approach implements the following three 
main steps: 1) spectral and spatial characterization step 
that introduces different methods to extract the spectral 
and the spatial features, 2) oversampling step that 
exploits interpolation techniques to create new labeled 
examples and 3) classification step based on the use of 
SVM with multi-feature kernels. 

2.1. Spectral and spatial characterization 
The wealthy spectral and spatial information available in 
hyperspectral images allows for the possibility to 
distinguish between spectrally similar materials. Various 
methods have been widely used in the literature for 
spectral and spatial characterizing hyperspectral pixels. 
For the spectral characterization, authors usually used all 
the spectral information or dimensionality reduction 
techniques like PCA and ICA to extract the most 
informative data. For the spatial features extraction, 
different means have been adopted such as: features 
provided from the neighborhood of the pixel, attribute 
filters and textural features. 

In this paper, we focus on the uses of PCA and ICA for 
the spectral characterization and the mean of 
neighborhood pixels, EMAP based on attribute filters 
and textural features for the spatial characterization. 

• PCA: is a statistical procedure that uses an orthogonal 
transformation to convert a set of observations of 
possibly correlated variables into a set of values of 
linearly uncorrelated variables called principal 
components. Then, PCA aims to remove the 
correlation among the bands. In the process, the 
optimum linear combination of the original bands 
accounting for the variation of pixel values in an 
image is identified. 

• ICA: is a popular approach to blind source 

separation, it has been investigated in the analyse of 
hyperspectral images to remove the dependence 
between bands. 

• Average of neighbourhood pixels: this spatial 

characterization technique explains each pixel (p) in 
terms of its neighborhood (pk) in a window (i*j) by 
calculating the average of their spectral information 
X(pk). It return X avg (Equation (1)). 

Xav g (p)=4rZ X (Pk) (!) 

1 J k=l 

• Textural features [22]: emphasize the texture 

structure of the graylevel image. They are local 
indexes computed by means of sliding windows of 
size P x Q. For hyperspectral image, these metrics 
can be found by adopting the panchromatic band, the 
first principal component or a discriminative band. 
Among these features we can note: 

- Focal mean (FM): is computed on the graylevel 
values contained in the sliding window centered on 
the pixel Xij. It return a local texture value Xij LM 
(Equation (2)). 
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^ LM =i I 


PQ, 


( 2 ) 


where w denotes the pixels contained in the window 
centered on Xy. 

- Variance (Var): It return a local texture value Xij Var 
(Equation (3)) 


r= iZ( X pq- } 


PQ P " 


( 3 ) 


where w denotes the pixels contained in the window 
centered on xy and Xij LM the local mean of the 
considered pixel. 

- Entropy (ENT): This measure compute the intensity 
of the texture in the considered image. It is based on 
the GLCM, that represents the relative occurrence 
frequency p(m, n) of two graylevel intensities m and 
n in the P x Q window at a given angular 
neighbourhood (Equation (4)). 

xy^-IIpdvn 2 )k)gp(n 1 ,n 2 ) (4) 


-Angular second moment (ASM): It indicates the 
local contrast, providing an accurate estimate on the 
degree of uniformity of the values of the GLCM 
(Equation (5)). 

Xij ASM = IIp(n„ n 2 ) 2 (5) 


• EMAP [23] is a profile that stacked the EAPs 
obtained using different type of attributes. The EAP is 
resulted by generating an AP (obtained by applying a 
sequence of attribute filters using various thresholds) 
on each of the first p principal components. 


2.2. Oversampling 

To increase the number of training samples, we 
implemented the oversampling algorithm (Algorithm 1) 
[21]. The goal of this algorithm is to generate from the t 
feature vectors yi of dimension dim presenting the set of 
training samples (Y tram =(yi, ...,y t )) g new feature vectors 
presenting the set of new training samples (Y new =(y new i, 
..., y new g )) by means of interpolation techniques. In fact, 
three interpolation methods have been used: linear 
interpolation, cubic spline interpolation and Lagrange 
interpolation. 


Algorithm 1: Oversampling 

READ Y™” 

FOR each row of the matrix Y tra/n i (For i=l to dim) 
Present each value of the row i by a point. 

Compute the interpolation function. 

Generate new samples abscissas. 

Compute new training samples according to the 
evaluation of the interpolation function f in new 
abscissas. 

Save new values in Y new . 

ENDFOR 
PRINT Y new 



Figure 1: Flowchart of oversampling 

3.3. Classification via SVM with multi-feature kernels 

SVMs have been widely adopted in the classification of 
hyperspectral images due to their high performance 
registered on the process of data with high 
dimensionality. SVM [24] is a kernel based classifier 
consisting in projecting data in a higher dimension space 
by means of non-linear mapping function ® and aiming 
at finding the optimal separator hyperplan by margin 
maximization. SVM has been proposed first for binary 
classification, after it has been introduced to solve multi- 
class classification. 

In order to improve the classification performance 
achieved by using the spectral information alone, various 
spectral-spatial classification approaches incorporating 
the spatial information in addition to the spectral 
information have been proposed in the literature. In 
particular, the uses of SVM with kernels that combining 
different type of kernels like composite kernels [8] and 
multi-feature kernels [14] has shown high performance 
in term of accuracy. 

In this paper, each pixel must be characterized by two 
spectral vectors x PCA and x ICA resulted respectively from 
the application of PCA et ICA and tree spatial vectors 
neighborhood^ x Texture anc [ x emap com p U ted respectively after 

the implementation of the mean of neighborhood pixels, 
textural features and EMAP. For that, we implemented 
tree different multi-feature kernels which combining 
these different spectral and spatial attributes: 


• Kernel 1 (Equation (6)): 


Figure 1 shows the flowchart of the oversampling step. 


TfPCA+lCA+neigh+Text+EMAFs \ _ TV- / PCA 

2 \ , Xj ) — JXp CA \X i , Xj ) + 

K ( EMAP EMAP \ 

EMAP t z' ’ I 


) + 
(6) 
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• Kernel 2 (Equation (7)): 

tv- PCA+ICA+Neigh + Text+EMAP / \ _ 

/iiV f A ; . , A j ) — 

P x [ K pca( x ? CA > x j CA ) + K ica ( x ! CA . A )] + (1 - P) x 

> x T ,9h ) + (* 7 “ , X T ) + 

with 0 < ju < 1 . 

• Kernel 3 (Equation (8)): 

6C iV {X i , X j ) — 

«1 x K pca (x? ca , x pca ) + a 2 X K ica (x? a ,x™) + a 3 x 

) + «4 x K rea (x, r “ , ) + a 5 X 

EMAP EMAP 


K 




> x j 


) 


( 8 ) 


where ^ a i = 1 • 

i=l 

To summarize the description of our proposed 
classification method, Algorithm 2 provides a 
pseudocode for our newly developed spectral spatial 
classification algorithm based on a SVM classifier with 
multi- feature kernels and oversampling. 

Algorithm 2: SVMoversampling 


READ Y = (y 1 , y n ) // Pixels of the hyperspectral 


image. 

READ T //Set of training samples. 

//Spectral characterization 
Y PCA = PCA(Y) //Compute the PCA of each pixels. 

T PCA = [T, pca , T c PCA ] //compute PCA of labeled 
pixels. 

Y ica = ICA(Y) //Compute the ICA of each pixels. 

T ICA = [T, ICA , T c ICA ] //compute ICA of learning 
pixels. 

// Spatial characterization 

Y Neigh = Neigh(Y) //Calculate for each pixel the average 
of neighborhood pixels. 

-pNeigh = ypNeigh > pNeigh ^ //Avemge Q f neighborhood 


pixels of training samples. 


yEMAP 

yEMAP 


EMAP(Y) //Compute the EMAP of each pixels. 

j- yEMAP yEMAP j . 


yText _ p ext (Y) //Compute the textural features the of 
each pixels. 



FOR each class i 

Y-2=oversampling(X CA ) 

Y!Z=oversampling(T! CA ) 

Y^f =oversampling(T^ gh ) 

Y^ =oversampling( T™** ) 
Y™=oversamplmg(T™) 

// Spectral and spatial features of training samples after 
oversampling. 


T pca _r '■pPCA V PCA T 
i * E > ^ i new -* 

i-pICA rylCA wICA i 

E ~L -Li > E new J 


y Neigh j- y Neigh y-Neigh j 


r-pEMAP _ rnnEMAP ^EMAP j 
i / 1 i > E new J 

i-pText _r yText wText t 

E ~L E ) E new J 

//Features of training samples in all the classes. 

y PCA _ r JIPCA y PCA y 
A new t 1 * A i J 

y ICA - r y/CA y ICA y 
A new t 1 } A i -I 

y Neigh _ j- pNeigh y Neigh j 


yEMAP r 't'EMAP yEMAP y 

A new t 1 j A i J 

y Text _ I- yrev yText y 

A new t 1 } A i J 

ENDFOR 

L= SVM_Classification( Y PCA , Y ICA , Y Nei ^ h , Y EMAP , Y™, 

y PCA r-pICA y Neigh y EMAP yText \ // Cl/A/T 

new ’ new ’ new ’ new ’ new J '' ^ * 


classification with multi-feature kernels. 
PRINT L //Labels of each pixel. 


Figure 2 illustrate the flowchart of algorithm 2. 

The architecture of the proposed approach is illustrated 
in figure 3. 



Figure 2: The flowchart of algorithm 2. 
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Figure 3: Architecture of the proposed approach. 


3. Experimental results 

To evaluate the performance of the proposed approach, 
we classified the widely used hyperspectral image " 
Indian Pines". It contains 145 x 145 pixels and 200 
spectral bands. The ground-truth data contains 16 classes 
and a total of 10366 labeled pixels. The classification of 
this image is a challenging task refer to the significant 
presence of classes with similar spectral signatures and 
also because of the unbalanced number of available 
labeled pixels per class. 

In all these experiments, we will use the classification 
accuracy (OA) and kappa coefficient (Kappa) as a 
references to evaluate the performance of the proposed 
classification approach. 

For SVM classification, we implemented the most used 
multi-class classification strategy "one against all" and 
we used RBF and polynomial kernels for the spectral and 
spatial features, respectively, to construct multi-feature 
kernels. The training sets are randomly selected from the 
available labeled samples and that the remaining samples 
are used for validation. We optimized the SVM 
parameters using tenfold cross-validation. 

After the spectral and the spatial characterization, each 
pixel has been presented by five vectors: 

- x PCA is a spectral vectors that contains the first five 
principals components. 

- x ICA is a spectral vector containing the six independents 
components. 

_ x Nei g h j s a S p a ti a i vector that contains the average of 
neighborhood pixels in a window of size 3x3. 

- x EMAP a spatial vector that contains the EMAP of each 
pixel. EMAP were built according to the used attributes 
and thresholds presented in [25]: threshold values in the 
range of 2,5% - 10% with a step of 2,5% for the standard 


deviation attribute and thresholds of 200, 500 and 1000 
for the area attribute. 

- x Text is a spatial vector that contains the textural features 
computed from the first three principals components. 
Note that we applied tree sliding windows: 3x3, 9x9 and 
15x15 and we used four directions to calculate these 
features (LM, VAR, ASM and ENT). 

Must indicate that the number of the used principals 
components in x PCA and the number of the adopted 
independents components for x ICA have been 
experimentally fixed according to the spectral 
classification of the image when using 10 labeled 
samples in each class (Figure 4). 

71 
70 
69 

OA 68 
67 
66 
65 

3456789 10 

Number of the used principals components 



(a) 



(b) 

Figure 4. OA variation according to the variation of the number of 
principal components (a) and the number of independent components 
(b). 
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In order to analyze the impact of the multi-feature kernel 
and to select the most suitable for the classification of the 
"Indian Pines" data set, we analyze the performance of 
the proposed method for the tree kernels: 

j^PC A+IC A+N eigh+T ext+EMAP ^j^PCA+ICA+Neigh+Text+EMAP an j 

a j^pcA+icA+Nei g h+Text+EMAP j n a classification with 
generating in each class 10 samples about 10 available 
training samples. Figure 5 shows the OAs and kappa 
obtained by the proposed classification algorithm 
according to the applied kernels. Note that we used 
(1=0.25 for ^J^PCA+ICA+Neigh+Text+EMAP a nd 

(0.1,0. 1,0.2, 0.2, 0.4) for a K PCA+ICA+Nei g h+Text+EMAP . This 
chose has been fixed experimentally (Figure 6). As 
shown in Figure 3, the performance of the proposed 
classification algorithm increases when using 
a j^pcA+icA+Neigh+Text+EMAP w hi c h valorize the spatial 

features extracted by EMAP (as=0.4 > a3=a4=0.2 > ai= 
a2=0.1), it yielded much better OA and kappa results 
(OA= 82% and kappa=81,73%). Furthermore, we can 
note that the weighted summation kernel introducing a 
trade-off (ji) between spectral and spatial kernels with 
|i=0.25 performs more accurately than the direct 
summation kernel. 



Figure 5: Resulted OA and Kappa coefficient according to the 
adopted multi- feature kernel 



(a) 



(ai,a2,a-3,a4,a5) 

(b) 

Figure 6. Variation of OA according to the variation of g for the 
kernel ^K PCA+ICA+Nei g h+Text+EMAP (a) and (*,02,03,04,05) for the 
kernel a K PCA+ICA+Nei g h+Text+EMAP (b). 


To show the advantage of the oversampling, we note in 
Table I the classification results obtained for different 
number of training samples after oversampling and 
without oversampling. In this experiment, we used cubic 
spline interpolation for the spectral and the spatial 
features to create new labeled samples and we adopted 
uK p cA + icA+Neigh+EMAP+Text as a mu lti-feature kernel. 

Table I illustrate the average of the OA followed by the 
standard deviation ( ± ) and the kappa coefficient 
obtained after ten Monte Carlo runs. By adopting 
oversampling, the proposed method significantly 
improved the classification results obtained by the 
considered classification without oversampling for all the 
adopted size of training set (10, 20 and 30). For instance, 
the generation of 40 samples about 10 labeled examples 
obtained an OA of 85.09%, 6.09% larger than that 
obtained by SVM without oversampling. As a result, the 
obtained samples after oversampling improves the 
accuracy of the supervised classifier (SVMs with multi- 
feature kernel). Notice also that the increase in the 
number of generated data improves significantly the 
performance of the classification (Figure 7) which 
indicates the advantage these samples that increase the 
ability of SVM to find the optimal separator hyperplan. 

Table 1: OA and kappa coefficient (in parenthesis) 
obtained for the Indian Pines data set 


Labeled 

samples 

Number of generated samples 

0 

10 

20 

30 

40 

10 

79% 

± 1.3 
(0.789) 

81.32% 

± 1.25 
(0.81) 

82.5% 

± 1.4 
(0.821) 

84.04% 

± 0.54 
(0.83) 

85.09% 

± 0.64 
(0.849) 

20 

85.56 

±1.2 

(0.86) 

87.86 
± 1.11 
(0.878) 

88.39% 

±0.66 

(0.89) 

89.54% 

± 0.67 
(0.9) 

90.34% 

±1.1 

(0.91) 

30 

87.68 

± 0.92 
(0.85) 

89.34% 

±1.07 

(0.896) 

90.19% 

±1.15 

(0.907) 

91.3% 

± 1.1 
(0.92) 

92.11% 

± 0.56 
(0.93) 



Generated set size 

(a) 


20 BO 

Generated s et size 


^ _ . 10 labeled samples 
| - 20 label e d s ampl es 
^ — 30 labeled samples 



labeled samples 

- | - 20 labeled samples 
A 30 labeled samples 


(b) 

Figure 7: Variation of OA (a) and kappa coefficient (b) according to the 
increase in generated set size. 
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Focusing on the oversampling step, we have indicated in 
the description of the proposed classification method that 
we have implemented tree interpolation techniques to 
generate new training data: linear interpolation, cubic 
spline interpolation and Lagrange interpolation. Then to 
analyze the impact of these techniques and to select the 
most adequate for each feature, we note in table 2 the 
spectral, the spatial and the spectral- spatial classification 
results obtained after the generation of 20 data from 10 
labeled samples by applying different interpolation 
methods. Table 2, illustrates for each interpolation 
method the OA, AA, kappa statistic coefficient (k), and 
individual class accuracy (in percent) results achieved by 
the spectral classification ( two spectral classifications: 
using characteristics extracted by PCA and characteristics 
extracted by ICA), the spatial classification (tree spatial 
classifications: using characteristics extracted by the 
average of neighborhoof pixels, textural features and 
characteristics extracted by EMAP) and the spectral- 
spatial classification assured by the proposed method. It 
is remarkable that the uses of Lagrange interpolation to 
create new spectral features (PCA, ICA) and spatial 


features extracted by EMAP and the average of 
neighborhood pixels leads to have more accurate spectral 
and spatial classifications than that resulted after the 
application of the other methods: cubic spline and linear 
interpolation. For textural features, it's clear that the 
linear interpolation provided the highest accuracy. Then, 
for the spectral-spatial classification we applied the 
Lagrange interpolation for feature extracted by PCA, 
ICA, Neigh, and EMAP and the linear interpolation for 
the textural features. This combination provided high 
performance, it obtained an OA of 83.87% and kappa 
coefficient of 0,8385, 1.37% and 0.017 larger than these 
obtained when we used cubic spline interpolation for all 
features (indicated in table 1). This indicate that the 
samples generated by this combination are properly 
created refer to their similarity to the oversampled data in 
each class. 

Figure 8 shows the ground truth and the classification 
result obtained without-oversampling and by the 
proposed method for the AVIRIS Indian Pines scene. The 
advantage of the proposed classification approach is 
clearly appreciable in this figure. 


Table 2: Table 2: OA, AA an kappa coefficient obtained for the AVIRIS Indian Pines data set 


Class 

Test 

samples 

Classification with oversampling 















Spectral classification 


Spatial classification 



ACP 

ACI 

EMAP 



Lin 

Spl 

Lag 

Lin 

Spl 

Lag 

Lin 

Spl 

Lag 

Alfalfa 

44 

77,27 

79,55 

79,55 

86,36 

86,36 

79,55 

86,36 

86,36 

86,36 

Com-notill 

1424 

22,33 

8,64 

15,45 

27,46 

21,35 

23,88 

30,34 

30,34 

30,68 

Com-mintill 

824 

22,21 

21 

38,47 

34,10 

28,40 

43,45 

34,10 

27,55 

27,55 

Com 

224 

72,32 

70,54 

82,59 

68,30 

65,18 

73,21 

41,07 

49,55 

49,55 

Grass/pasture 

487 

68,38 

65,50 

58,11 

84,80 

82,75 

75,15 

82,75 

82,75 

82,75 

Grass/trees 

737 

81,68 

84,12 

86,43 

83,18 

88,60 

87,38 

87,89 

87,89 

85,89 

Grass/pasture- 

mowed 

16 

81,25 

87,50 

87,50 

68,75 

87,50 

87,50 

93,75 

93,75 

93,75 

Hay-windrowed 

479 

88,10 

89,14 

81 

66,39 

68,89 

89,77 

91,65 

91,65 

91,65 

Oats 

10 

90 

90 

100 

80 

90 

100 

100 

90 

100 

Soybean-no till 

958 

43,63 

46,56 

34,76 

15,76 

48,33 

13,47 

48,15 

48,15 

55,5 

Soybean-min till 

2458 

51,51 

31,08 

44,26 

48,41 

8,01 

57,28 

48,70 

48,70 

48,70 

Soybean-clean till 

604 

43,21 

36,26 

44,87 

45,20 

26,16 

64,57 

66,23 

66,23 

66,23 

Wheat 

202 

63,37 

74,75 

79,70 

52,97 

56,93 

68,81 

95,05 

95,05 

95,05 

Woods 

1284 

28,27 

47,51 

47,90 

47,66 

52,65 

58,18 

75,86 

75,86 

75,86 

Bldg-Grass- 

Trees-Drives 

370 

40 

62,16 

50,81 

18,92 

26,22 

20,54 

62,16 

71,35 

71,35 

Stone- Steel- 
Towers 

85 

91,76 

91,76 

97,65 

90,59 

91,76 

91,76 

91,76 

97,65 

97,65 

OA (%) 


69,59 

68,84 

70,59 

68,43 

66,61 

71,44 

72,11 

72,98 

74,02 

AA (%) 


61,63 

61,63 

64,32 

57,43 

58,07 

64,66 

69,82 

70,67 

71,15 

kappa 


0,6026 

0,5872 

0,6089 

0,6152 

0,6181 

0.67 

0,721 

0,721 

0,721 
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Table 2 (suite) 


Class 

Test 

sam 

pies 

Classification with oversampling 

Spatial classification 

Multi-feature classification 

Neighborhood 

Textural features 

Lin 

Spl 

Lag 

Lin 

Spl 

Lag 

Without 

oversampling 

Lag_Lag_Lag_ 
Lag Lin 

Alfalfa 

44 

88,64 

86,36 

86,36 

84,09 

88,64 

84,09 

90,91 

93,18 

Com-notill 

1424 

19,87 

21,14 

22,68 

9,27 

13,41 

10,60 

30,34 

38,97 

Com-mintill 

824 

28,28 

31,92 

27,55 

16,38 

17,23 

24,51 

34,59 

57,28 

Com 

224 

45,54 

41,52 

49,55 

41,07 

41,07 

37,05 

82,59 

87,05 

Grass/pasture 

487 

47,23 

45,38 

60,57 

44,35 

40,66 

35,93 

74,33 

82,96 

Grass/trees 

737 

82,77 

81,82 

85,89 

31,34 

31,48 

27,68 

89,69 

93,49 

Grass/pasture-mowed 

16 

93,75 

93,75 

93,75 

81,25 

87,50 

87,50 

87,50 

87,50 

Hay-windrowed 

479 

94,36 

93,53 

91,65 

59,71 

60,54 

55,74 

92,28 

95,82 

Oats 

10 

100 

100 

100 

80 

80 

90 

100 

100 

Soybean-no till 

958 

49,16 

49,69 

49,58 

27,56 

34,66 

43,01 

55,22 

68,06 

Soybean-min till 

2458 

48,58 

48,17 

48,70 

45,77 

45,36 

42,35 

52,73 

59,72 

Soybean-clean till 

604 

70,03 

69,04 

66,23 

40,89 

30,79 
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Figure 8: Classification maps obtained for the AVIRIS Indian Pines scene 
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4. Conclusion 

In this paper, we have presented a new spectral-spatial 
classification approach which combines various spectral 
and spatial features via multi-feature kernels and generate 
new training samples to solve two problems widely 
proposed in the classification of hyperspectral images 
which are the difficulty of the choice of the applied 
characterization methods and the availability of a limited 
number of labeled samples. It investigates the 
oversampling based on interpolation techniques to 
increase the size of training set. By using the kernel 
a j^pcA+icA+Neigh+Text+EMAP w hi c h valorize the spatial 

features computed by EMAP and by adopting the 
Lagrange interpolation for features extracted by PCA, 
ICA, the average of neighborhood pixels and EMAP and 
the linear interpolation for textural features in the 
oversampling step, the proposed method provides good 
accuracies when compared with the spectral 
classification, the spatial classification and the 
classification without oversampling. Combining multi- 
feature kernels and oversampling provides competitive 
and encouraging results. Further work should be focused 
on the exploitation of active learning algorithms to 
improve the quality of the generated samples in the 
oversampling step. 
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